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SYNTHETIC INSECTICIDAL CRYSTAL PROTEIN GENE 



FIELD OF THE INVENTION 

This invention relates to the field of bacterial molecular biology and, in particular, to genetic engineering 
by recombinant technology for the purpose of protecting plants from insect pests. Disclosed herein are the 
5 chemical synthesis of a modified crystal protein gene from Bacillus thuringiensis var. tenebrionis (Btt), and 
the selective expression of this synthetic insecticidal gene. Also disclosed is the transfer of the~cloned 
synthetic gene into a host microorganism, rendering the organism capable of producing, at improved levels 
of expression, a protein having toxicity to insects. This invention facilitates the genetic engineering of 
bacteria and plants to attain desired expression levels of novel toxins having agronomic value. 

70 

BACKGROUND OF THE INVENTION 

B. thuringiensis (Bt) is unique in its ability to produce, during the process of sporulation. proteinaceous, 
75 crystalline inciusions which are found to be highly toxic to several insect pests of agricultural importance. 
The crystal proteins of different Bt strains have a rather narrow host range and hence are used 
commercially as very selective biological insecticides. Numerous strains of Bt are toxic to lepidopteran and 
dipteran insects. Recently two subspecies (or varieties) of Bt have been~7eported to be pathogenic to 
coleopteran insects: var. tenebrionis (Krieg et al. (1983) Z. Angew. Entomol. 96:500-508) and var. san diego 
20 (Herrnstadt et al. (1986) Biotechnol. 4:305-308). Both strains produce flat. rectangular crystal inclusions and 
have a major crystal component of 64-68 kDa (Herrnstadt et al. supra ; Bernhard (1986) FEMS Microbiol. 
Lett. 33:261-265). 

Toxin genes from several subspecies of Bt have been cloned and the recombinant clones were found to 
be toxic to lepidopteran and dipteran insect larvae. The two coleopteran-active toxin genes have also been 

25 isolated and expressed. Herrnstadt et al. supra clones a 5.8 kb Bam HI fragment of Bt var. san diego DNA. 
The protein expressed in E. coli was toxic to P. luteola (ElM leaf beetle) and had~a molecular weight of 
approximately 83 kDa. This 83 kDa toxin product from the var. san diego gene was larger than the 64 kDa 
crystal protein isolated from Bt var. san diego cells, suggesting that the Bt var. san diego crystal protein 
may be synthesized as a larger precursor molecule that is processed by Bt var. sarTdiego but not by E. coli 

30 prior to being formed into a crystal. 

Sekar et al. (1987) Proc. Nat. Acad. Sci. USA 84:7036-7040: U.S. Patent Application 108.285, filed 
October 13. 1987 isolated the crystal protein gene from Btt and determined the nucleotide sequence. This 
crystal protein gene was contained on a 5.9 kb Bam HI fragment (pNSBF544). A subclone containing the 3 
kb Hind lll fragment from pNSBF544 was constructed. This Hind lll fragment contains an open reading frame 

35 (ORF) that encodes a 644-amino acid polypeptide of approximately 73 kDa. Extracts of both subclones 
exhibited toxicity to larvae of Colorado potato beetle ( Leptinotarsa decern I ineata , a coleopteran insect). 73- 
and 65-kDa peptides that cross-reacted with an antiserum against the crystal protein of var. tenebrionis 
were produced on expression in E. coli . Sporulating var. tenebrionis cells contain an immunoreactive 73-kDa 
peptide that corresponds to the expected product from the ORF of pNSBP544. However, isolated crystals 

40 primarily contain a 65-kDa component. When the crystal protein gene was shortened at the N-terminal 
region, the dominant protein product obtained was the 65-kDa peptide. A deletion derivative. p544Pst-Met5, 
was enzymatically derived from the 5.9 kb Bam HI fragment upon removal of forty-six amino acid residues 
from the N-terminus. Expression of the N-terminal deletion derivative, p544Pst-Met5, resulted in the 
production of. almost exclusively, the 65 kDa protein. Recently. McPherson et al. (1988) Biotechnology 

45 6:61-66 demonstrated that the Btt gene contains two functional translations initiation codons in the same 
reading frame leading to the production of both the full-length protein and an N-terminal truncated form. 

Chimeric toxin genes from several strains of Bt have been expressed in plants. Four modified Bt2 
genes from var. berliner 1715. under the control of the 2 promoter of the Agrobacterium TR-DNA. were 
transferred into tobacco plants (Vaeck et al. (1987) Nature 328:33-37). Insecticidal levels of toxin were 

so produced when truncated genes were expressed in transgenic plants. However, the steady state mRNA 
levels in the transgenic plants were so low that they could not be reliably detected in Northern blot analysis 
and hence were quantified using ribonuclease protection experiments. Bt mRNA levels in plants producing 
the highest level of protein corresponded to «0.0001% of the poly(A)* mRNA. In the report by Vaeck et al. 
(1987) supra , expression of chimeric genes containing the entire coding sequence of Bt2 were compared to 
those containing truncated Bt2 genes. Additionally, some T-DNA constructs included achimeric NPTII gene 

2 
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as a marker selectable in plants, whereas other constructs carried translations fusions between fragments 
of Bt2 and the NPTI. gene. Insecticidal .evels of toxin were produced when truncated genes or fu s,on 
conltFucts were expressed in transgenic plants. Greenhouse grown plants produced 1-0.02 !/» of the to a. 
solub e protein as the toxin, or 3ug of toxin per g. fresh leaf tissue and, even at five-fo.d lowe 
Sowed 100% mortality in six-day feeding assays. However, no significant .nsect,c,da. activity coulc I be 
obteined using the intact Bt2 coding sequence, despite the fact that the same promoter *as usedto direct 
its expression Intact BffipFoteln and RNA yields in the transgenic plant leaves were 10 - 50 t.mes lower 
than those for the truncated Bt2 polypeptides or fusion proteins. 

Barton et al (1987) PlanT Physiol. 85:1103-1109 showed expression of a Bt protein m a system 
contain^ alsl promoter, a viral (TMV) teider sequence, the Bt HD-1 4.5 Kb gene (encodmg a 645 amino 
aXrotein followed by two proline residues) and a nopaline synthase (nos) D0ly( A) +j sequence. Under 
Se conditions expression was observed for Bt mRNA at levels up to 47 pg/20ug RNA and 12 ng/mg 
Sant protein. This amount of Bt protein in plantlissue produced 100% mortality in jo days. This level of 
exDression still represents a low level of mRNA (2.5 X 1 0~*%) and protein (1 .2 X 1 0 /.). 

P Various hyWd proteins consisting of N-terminal fragments of increasing length of the Bt2 protein ^fused 
to NPTI. were produced in E. co.i by Hofte et al. (1988) FEBS Lett. 226:364-370. Fusion proteins conta.mng 
he first 607 amino acids oTitT exhibited Tnslct toxicity; fusion proteins not contammg this mimmum N- 
te mS fragment were nonto^. Appearance of NPT.I activity was not dependent upon the ^«J»J 
insecticidal activity; however, the conformation of the Btt polypeptide appeared to ^ rt " 
influence on the enzymatic activity of the fused NPTII protein. This study did suggest that the global 3-D 
structure of the Bt2 polypeptide is disturbed in truncated polypeptides. 

A number oTTesearchers have attempted to express plant genes m ^ast (Ne.ll^ * aL ( Gene 
55-303-317- Rothstein et al. (1987) Gene 55:363-356; Coraggio et al. (1986) EMBO J. 5:459-465) and E. col, 
Sfzakaws 'eV ai 09877 FEBS Lett. 224:125-127; Vies et a.. (1986) EMBO J. 5:2439-2444; Gatenby et al. 
nSS) Eur * lochem 168:227-231)^ the case of wFeit c-gliadin (Nei.l et al. (1987) supra), -amy.ase 
Rothstein e aL (1987) supra) genes, and maize zein genes (Coraggio et al. (1986) supra) ,n yeast, ow 
evels o exlriision havelSin Reported. Nei.l et al. have suggested that the low .eve* o 
aS in yeast may be due in part to codon uslge bias, since a-gliadin codons for Phe Leu, Ser Gly, Tyr 
Sf especiaSy GIU do not correlate well with the abundant yeast isoacceptor tRNAs. In E. 
ZtlT^Lo A2 (Fuzakawa et al. (1987) supra) and wheat RuBPC SSU (Vies et al. (1986) supra. 
Gatenby et al. (1987) supra ) are expressed adequately. Binchim 
Not much is kno^Tibout the makeup of tRNA populations in plants. Viott. et al. O^Biochim. 
Biophys. Acta 517:125-132 report that maize endosperm actively synthesizing zein a it^M nch .n 
glutamine. leuclnT. and alanine, is characterized by higher levels of acceptmg actv.ty for these ^ee ^no 
acids than are maize embryo tRNAS. This may indicate that the tRNA population of specfic plant tssues 
may be adapted for optimum translation of highly expressed proteins such as zein. To our knowledge , no 
one has experimentally altered codon bias in highly expressed plant genes to determ.ne possible effects of 
the protein translation in plants to check the effects on the level of expression. 
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SUMMARY OF THE INVENTION 



.t is the overall object of the present invention to provide a means for plant protection <^ 
damage. The invention disclosed herein comprises a chemically synthesized gene encoding an «MCb«M 
2 which is functionally equivalent to a native insecticidal protein of Bt. This synthetic gene s designed 
tote expressed in plants at a*, eve. higher than a native Bt genejt is preferred J^^^f? 
designed to be highly expressed in plants as defined herein. Preferably, the synthetic gene .s at least 
approximately 85% homologous to an insecticidal protein gene of Bt. , n<!e , r «riri a i 
It is a particular object of this invention to provide a synthetic structural gene coding for an 'nsect,c.dal 
protein from Btt having, for example, the nucleotide sequences presented ,n Figure 1 and spanning 
nucleotides 1 through 1793 or spanning nucleotide 1 through 1833 with functional ^JJ^*- sequence 
In designing synthetic Btt genes of this invention for enhanced expression .n plants the DNA sequence 
of the natile Btt structural iine is modified in order to contain codons preferred by h.ghly expressed plant 
aenes to attain A+T content in nucleotide base composition substantially that found in plants, and also 
preferaoW to "orm a plant initiation sequence, and to eliminate sequences that cause ^binzafton 
fnapprtriate poly adenylate, degradation and termination of RNA and to avoid 

secondary structure hairpins and RNA splice sites. In the synthetic genes codons used to specify a given 
amino acid are selected with regard to the distribution frequency of codon usage employed ,n h ( ghly 
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expressed plant genes to specify that amino acid. As is appreciated by those skilled in the art, the 
distribution frequency of codon usage utilized in the synthetic gene is a determinant of the level of 
expression. Hence, the synthetic gene is designed such that its distribution frequency of codon usage 
deviates, preferably, no more than 25% from that of highly expressed plant genes and, more preferably, no 

s more than about 10%. In addition, consideration is given to the percentage G + C content of the degenerate 
third base (monocotyledons appear to favor G+C in this position, whereas dicotyledons do not). It is also 
recognized that the XCG nucleotide is the least preferred codon in dicots whereas the XTA codon is 
avoided in both monocots and dicots. The synthetic genes of this invention also preferably have CG and TA 
doublet avoidance indices as defined in the Detailed Description closely approximating those of the chosen 

w host plant. More preferably these indices deviate from that of the host by no more than about 10-15%. 

Assembly of the Bt gene of this invention is performed using standard technology known to the art. The 
Btt structural gene designed for enhanced expression in plants of the specific embodiment is enzymaticalty 
assembled within a DNA vector from chemically synthesized oligonucleotide duplex segments. The 
synthetic Bt gene is then introduced into a plant host cell and expressed by means known to the art. The 

rs insecticidal protein produced upon expression of the synthetic Bt gene in plants is functionally equivalent to 
a native Bt crystal protein in having toxicity to the same insects. 



BRIEF DESCRIPTION OF THE FIGURES 

20 

Figure 1 presents the nucleotide sequence for the synthetic Btt gene. Where different, the native 
sequence as found in p544Pst-Met5 is shown above. Changes in amino acids (underlined) occur in the 
synthetic sequence with alanine replacing threonine at residue 2 and leucine replacing the stop at residue 
596 followed by the addition of 13-amino acids at the C-terminus. 

25 Figure 2 represents a simplified scheme used in the construction of the synthetic Btt gene. Segments 

A through M represent oligonucleotide pieces annealed and ligated together to form DNA duplexes having 
unique splice sites to allow specific enzymatic assembly of the DNA segments to give the desired gene. 

Figure 3 is a schematic diagram showing the assembly of oligonucleotide segments in the construc- 
tion of a synthetic Btt gene. Each segment (A through M) is built from oligonucleotides of different sizes, 

30 annealed and ligated to form the desired DNA segment. 

DETAILED DESCRIPTION OF THE INVENTION 

The following definitions are provided in order to provide clarity as to the intent or scope of their usage 

35 in the Specification and claims. 

Expression refers to the transcription and translation of a structural gene to yield the encoded protein. 
The synthetic Bt genes of the present invention are designed to be expressed at a higher level in plants 
than the corresponding native Bt genes. As will be appreciated by those skilled in the art, structural gene 
expression levels are affected by the regulatory DNA sequences (promoter, polyadenylation sites, enhan- 

40 cers, etc.) employed and by the host cell in which the structural gene is expressed. Comparisons of 
synthetic Bt gene expression and native Bt gene expression must be made employing analogous regulatory 
sequences and in the same host ceil. It will also be apparent that analogous means of assessing gene 
expression must be employed in such comparisons. 

Promoter refers to the nucleotide sequences at the 5' end of a structural gene which direct the initiation 

ds of transcription. Promoter sequences are necessary, but not always sufficient to drive the expression of a 
downstream gene. In prokaryotes, the promoter drives transcription by providing binding sites to RNA 
polymerases and other initiation and activation factors. Usually promoters drive transcription preferentially in 
the downstream direction, although promotional activity can be demonstrated {at a reduced level of 
expression) when the gene is placed upstream of the promoter. The level of transcription is regulated by 

so promoter sequences. Thus, in the construction of heterologous promoter/structural gene combinations, the 
structural gene is placed under the regulatory control of a promoter such that the expression of the gene is 
controlled by promoter sequences. The promoter is positioned preferentially upstream to the structural gene 
and at a distance from the transcription start site that approximates the distance between the promoter and 
the gene it controls in its natural setting. As is known in the art, some variation in this distance can be 

55 tolerated without loss of promoter function. 

A gene refers to the entire DNA portion involved in the synthesis of a protein. A gene embodies the 
structural or coding portion which begins at the S end from th translation^ start codon (usually ATG) and 
extends to the stop (TAG, TGA or TAA) codon at the 3 end. It also contains a promoter region, usually 

4 
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located 5 or upstream to the structural gene, which initiates and regulates the expression of a structural 
gene Also included in a gene are the 3' end and poly(A) + addition sequences. 

Structural gene is that portion of a gene comprising a DNA segment encoding a protein, polypept.de or 
a p ortion ther e'STand excluding the 5' sequence which drives the initiation of transcription. The struc ural 
gene may be one which is normally found in the cell or one which is not normally found in the cellular 
location wherein it is introduced, in which case it is termed a heterologous gene. A heterologous gene may 
be derived in whole or in part from any source know to the art, including a bacterial genome or episome. 
eukaryotic, nuclear or plasmid DNA. cDNA. viral DNA or chemically synthesized DNA. A structural gene 
may contain one or more modifications in either the coding or the untranslated regions which could affect 
the biological activity or the chemical structure of the expression product, the rate of expression or the 
manner of expression control. Such modifications include, but are not limited to, mutat.ons. insertions 
deletions and substitutions of one or more nucleotides. The structural gene may constitute an uninterrupted 
codinq sequence or it may include one or more introns, bounded by the appropriate splice junctions. The 
structural gene may be a composite of segments derived from a plurality of sources, naturally occumng or 
synthetic. The structural gene may also encode a fusion protein. 

Synthetic gene refers to a DNA sequence of a structural gene that is chemically synthesized in its 
enti rety or fo r ~the~greater part of the coding region. As exemplified herein, oligonucleotide building blocks 
are synthesized using procedures known to those skilled in the art and are ligated and annealed to form 
gene segments which are then enzymatically assembled to construct the entire gene. As is recognized by 
those skilled in the art. functionally and structurally equivalent genes to the synthetic genes described 
herein may be prepared by site-specific mutagenesis or other related methods used in the art. 

Transforming refers to stably introducing a DNA segment carrying a functional gene into an organism 
that did not previously contain that gene. 

Plant tissue includes differentiated and undifferentiated tissues of plants, including but not limited to. 
rootnho5tri5aves. pollen, seeds, tumor tissue and various forms of cells in culture, such as single cells, 
protoplasts, embryos and callus tissue. The plant tissue may be in planta or in organ, tissue or cell culture. 
Plant cell as used herein includes plant cells in planta and plant cells and protoplasts m culture. 
Homology refers to identity or near identity of nucleotide or amino acid sequences. As is understood in 
the art, nucleo tide mismatches can occur at the third or wobble base in the codon without causing am.no 
acid substitutions in the final polypeptide sequence. Also, minor nucleotide modifications (e.g.. substitutions 
insertions or deletions) in certain regions of the gene sequence can be toto ^ . wn "*^ 
insignificant whenever such modifications result In changes in amino acid sequence that do not alter 
functionality of the final product. It has been shown that chemically synthesized copies of whole, or parts of. 
qene sequences can replace the corresponding regions in the natural gene without loss of gene function. 
Homologs of specific DNA sequences may be identified by those skilled in the art using the test of cross- 
hybridization of nucleic acids under conditions of stringency as is well understood in the art (as described in 
Hames and Higgens (eds.) (19B5) Nucleic Acid Hybridization . IRL Press, Oxford. UK). Extent of homology is 
often measured in terms of percentage of identity between the sequences compared. 

Functionally equivalent refers to identity or near identity of function. A synthetic gene product which is 
toxi c to at least one of the same insect species as a natural Bt protein is considered functionally equivalent 
thereto. As exemplified herein, both natural and synthetic Btt genes encode 65 kDa. insectiadal proteins 
having essentially identical amino acid sequences and having toxicity to coleopteran insects. The synthetic 
Bt genes of the present invention are not considered to be functionally equivalent to native Bt genes, since 
they are expressible at a higher level in plants than native Bt genes. 

Frequency of preferred codon usage refers to the preference exhibited by a specific host cell in usage 
of nudiStidTcodons to specWl gi^ amino acid. To determine the frequency of usage of a P^icular 
codon in a gene, the number of occurrences of that codon in the gene is divided by the total number of 
occurrences of all codons specifying the same amino acid in the gene. Table 1, for example, gives the 
frequency of codon usage for Bt genes, which was obtained by analysis of four Bt genes whose sequences 
are publicly available. Similar!^ the frequency of preferred codon usage exhibited by a host cell can be 
calculated by averaging frequency of preferred codon usage in a large number of genes expressed by the 
host cell. It is preferable that this analysis be limited to genes that are highly expressed by the host cell. 

Table 1 for example, gives the frequency of codon usage by highly expressed genes exhorted by 
dicotyledonous plants, and monocotyledonous plants. The dicot codon usage was calculated using 154 
; highly expressed coding sequences obtained from Genbank which are listed in Table 1 Monocot codon 
usage was calculated using 53 monocot nuclear gene coding sequences obtained from Genbank and listed 
in Table 1. located in Example 1. „w 
When synthesizing a gene for improved expression in a host cell it is desirable to design the gene such 
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that its frequency of codon usage approaches the frequency of preferred codon usage of the host cell. 

The percent deviation of the frequency of preferred codon usage for a synthetic gene from that 
employed by a host cell is calculated first by determining the percent deviation of the frequency of usage of 
a single codon from that of the host ceil followed by obtaining the average deviation ov r all codons. As 

5 defined herein this calculation includes unique codons (i.e., ATG and TGG). The frequency of preferred 
codon usage of the synthetic Btt gene, whose sequence is given in Figure 1, is given in Table 1. The 
frequency of preferred usage of the codon 'GTA' for valine in the synthetic gene (0.10) deviates from that 
preferred by dicots (0.12) by 0.02/0.12 = 0.167 or 16.7%. The average deviation over all amino acid 
codons of the Btt synthetic gene codon usage from that of dicot plants is 7.8%. In general terms the overall 

iq average deviation of the codon usage of a synthetic gene from that of a host cell is calculated using the 
equation 



75 



n=l-z 



x 100 



where X n = frequency of usage for codon n in the host cell; Y n = frequency of usage for codon n in the 
20 synthetic gene. Where n represents an individual codon that specifies an amino acid, the total number of 
codons is Z, which in the preferred embodiment is 61. The overall deviation of the frequency of codon 
usage for all amino acids should preferably be less than about 25%, and more preferably less than about 
10%. 

Derived from is used to mean taken, obtained, received, traced, replicated or descended from a source 

2 5 (chemical and/or biological). A derivative may be produced by chemical or biological manipulation 
(including but not limited to substitution, addition, insertion, deletion, extraction, isolation, mutation and 
replication) of the original source. 

Chemically synthesized , as refated to a sequence of DNA, means that the component nucleotides were 
assembled in vitro . Manual chemical synthesis of DNA may be accomplished using well established 

30 procedures (Caruthers, M. (1983) in Methodology of DNA and RNA Sequencing . Weissman (ed.), Praeger 
Publishers, New York, Chapter 1), or automated chemical synthesis can be performed using one of a 
number of commercially available machines. 

The term, designed to be highly expressed as used herein refers to a level of expression of a designed 
gene wherein the amount of its specific mRNA transcripts produced is sufficient to be quantified in Northern 

3S blots and. thus, represents a level of specific mRNA expressed corresponding to greater than or equal to 
approximately 0.001% of the poly(A)+ mRNA. To date, natural Bt genes are transcribed at a level wherein 
the amount of specific mRNA produced is insufficient to be estimated using the Northern blot technique. 
However, in the present invention, transcription of a synthetic Bt gene designed to be highly expressed not 
only allows quantification of the specific mRNA transcripts produced but also results in enhanced 

4Q expression of the translation product which is measured in insecticidal bioassays. 

Crystal protein or insecticidal crystal protein or crystal toxin refers to the major protein component of 
the parasporal crystals formed in strains of Bt. This protein component exhibits selective pathogenicity to 
different species of insects. The molecular size of the major protein isolated from parasporal crystals varies 
depending on the strain of Bt from which it is derived. Crystal proteins having molecular weights of 

4S approximately 132, 65, and 28 kDa have been reported. It has been shown that the approximately 132 kDa 
protein is a protoxin that is cleaved to form an approximately 65 kDa toxin. 

The crystal protein gene refers to the DNA sequence encoding the insecticidal crystal protein in either 
full length protoxin or toxin form, depending on the strain of Bt from which the gene is derived. 

The authors of this invention observed that expression in plants of Bt crystal protein mRNA occurs at 

50 levels that are not routinely detectable in Northern blots and that low levels of Bt crystal protein expression 
correspond to this low level of mRNA expression. It is preferred for exploitatiorTof these genes as potential 
biocontrol methods that the level of expression of Bt genes in plant cells be improved and that the stability 
of Bt mRNA in plants be optimized. This will allow greater levels of Bt mRNA to accumulate and will result 
in an increase in the amount of insecticidal protein in plant tissued This is essential for the control of 

S5 insects that are relatively resistant to Bt protein. 

Thus, this invention is based on the recognition that expression levels of desired, recombinant 
insecticidal protein in transgenic plants can be improved via increased expression of stabilized mRNA 
transcripts; and that, conversely, detection of these stabilized RNA transcripts may be utilized to measure 
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plants, modifications in the coding region of the gene weT6 ^2^TZZZs\m synthetic DNA 
studied using site-specific mutagenesis , by ' ^""J ' f ^^'ST^M. M 22864289). 
duplexes containing the desired nucleot.de changes (Lo et al. (1984 . Proc. warn. — thesize 

However, recent advances in recombinant DNA technology "tT^g^^^ Xemica.ly 
an entire gene designed specifically for a des.red *»«™^J%& A^gene synthesis provides 

of appropriately positioned restriction ^^^^^f^ protein toxic to an insect. As 

achieve improved levels of crow-expression. modified gene. 

consists of designing an improved nuc.eot.de sequence fo -* e ' c ^ng region and 9 

from chemically synthesized oligonucleot.de segments. In Mpr«M ^coding a 65 kDa polypeptide 

highly expressed proteins of the host cell are he |eve| of expression of the 

Bias in codon choice within genes in a single species appears related to me e * 

. ^T^^^r^T^r^^ to cons,. »hen 

engineering high expression ol heterologous gm in yeas! «nc l* e ' s '^ . is nas indicated i„ 
Experimental evidence obtained Iron, point mutation, Jo oce s sing RNA destabilize- 
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and most preferably is about 55%. Also, for ultimate expression in plants, the synthetic gene nucleotide 
sequence is preferably modified to form a piant initiation sequence at the 5 end of the coding region. In 
addition, particular attention is preferably given to assure that unique restriction sites are placed in strategic 
positions to allow efficient assembly of oligonucleotide segments during construction of the synthetic gene 
and to facilitate subsequent nucleotide modification. As a result of these modifications in coding region of 
the native Bt gene, the preferred synthetic gene is expressed in plants at an enhanced level when 
compared to that observed with natural Bt structural genes. 

In specific embodiments, the synthetic Bt gene of this invention encodes a Btt protein toxic to 
coleopteran insects. Preferably, the toxic polypeptide is about 598 amino acids in length, is at least 75% 
homologous to a Btt polypeptide, and, as exemplified herein, is essentially identical to the protein encoded 
by p544Pst-Met5, except for replacement of threonine by alanine at residue 2. This amino acid substitution 
results as a consequence of the necessity to introduce a guanine base at position +4 in the coding 
sequence. 

In designing the synthetic gene of this invention, the coding region from the Btt subclone, p544Pst- 
Met5, encoding a 65 kDa polypeptide having coleopteran toxicity, is scanned for"possib!e modifications 
which would result in improved expression of the synthetic gene in plants. For example, in preferred 
embodiments, the synthetic insecticidal protein is strongly expressed in dicot plants, e.g., tobacco, tomato, 
cotton, etc., and hence, a synthetic gene under these conditions is designed to incorporate to advantage 
codons used preferentially by highly expressed dicot proteins. In embodiments where enhanced expression 
of insecticidal protein is desired in a monocot, codons preferred by highly expressed monocot proteins 
(given in Table 1) are employed in designing the synthetic gene. 

In general, genes within a taxonomic group exhibit similarities in codon choice, regardless of the 
function of these genes. Thus an estimate of the overall use of the genetic code by a taxonomic group can 
be obtained by summing codon frequencies of all its sequenced genes. This species-specific codon choice 
is reported in this invention from analysis of 208 plant genes. Both monocot and dicot plants are analyzed 
individually to determine whether these broader taxonomic groups are characterized by different patterns of 
synonymous codon preference. The 208 plant genes included in the codon analysis code for proteins 
having a wide range of functions and they represent 6 monocot and 36 dicot species. These proteins are 
present in different plant tissues at varying levels of expression. 

In this invention it is shown that the relative use of synonymous codons differs between the monocots 
and the dicots. In general, the most important factor in discriminating between monocot and dicot patterns 
of codon usage is the percentage G + C content of the degenerate third base. In monocots, 16 of 18 amino 
acids favor G + C in this position, while dicots only favor G+C in 7 of 18 amino acids. 

The G ending codons for Thr, Pro, Ala and Ser are avoided in both monocots and dicots because they 
contain C in codon position II. The CG dinucleotide is strongly avoided in plants (Boudraa (1987) Genet. 
Sel. Evol. 1^9:143-154) and other eukaryotes (Grantham et al. (1985) Bull. Inst. Pasteur 83:95-148), possibly 
due to regulation involving methylation. In dicots, XCG is always the least favored codonTwhile in monocots 
this is not the case. The doublet TA is also avoided in codon positions II and III in most eukaryotes, and this 
is true of both monocots and dicots. 

Grantham and colleagues (1986) Oxford Surveys in Evol. Biol. 3:48-81 have developed two codon 
choice indices to quantify CG and TA doublet avoidance in codon positions II and III. XCG/XCC is the ratio 
of codons having C as base II of G-ending to C-ending triplets, while XTA/XTT is the ratio of A-ending to T- 
ending triplets with T as the second base. These indices have been calculated for the plant data in this 
paper (Table 2) and support the conclusion that monocot and dicot species differ in their use of these 
dinucleotides. 
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Table 2 



70 



75 



20 



25 



30 



35 



40 



45 



Avoidance of CG and TA doublets in codons position 11-111. 



XCG/XCC and XTAXAA values are multiplied by 100. 



Group 


Plants 


Dicots 


Monocots 


Maize 


Soybean 


RuBPC 
SSU 


CAB 


XCG/XCC 


40 


30 


61 


67 


37 


18 


22 


XTA/XTT 


37 


35 


47 


43 


41 


9 


13 



RuBPC SSU = ribulose 1.5 bisphosphate small subunit 



CAB = chlorophyll a/b binding protein 



50 



55 



Additionally, for two species, soybean and maize, species-specific codon usage profiles were ca Jculated 
(not shown) Tne maize codon usage pattern resembles that of monocots in general, s.nce these sequences 
wnCTo»r^ of the monocot sequences availabie. The codon profile of the ma,ze subsample n even 
morTstrkSy biased in its preference for G+C in codon position III. On the other hand, the soybean 
SntSUH. * almost identical to the general dicot pattern, even though it represents a much 
smaller portion of the entire dicot sample. r ih,.inse 1 5 

,n order to determine whether the coding strategy of highly expressed genes 
bisphosphate small subunit (RuBPC SSU) and chlorophyll a/b binding protein (CAB » * mo e b-ased than 
that of Plant genes in general, codon usage profiles for subsets of these genes (19 and 17 sequences 
respectively were calculated (not shown). The RuBPC SSU and CAB pooled samples are charactenzed by 
iC-voidarlce of the codons XCG and XTA than in the larger TSZS^S* 
Although most of the genes in these subsamples are d.cot in ongm (17/19 and 15/17). their codon prone 
resembles that of the monocots in that Q + C is utilized in the degenerate base III. 

The us d ^ pooled data for highly expressed genes may obscure ide ntification . 
patterns in codon choice. Therefore, the codon choices of individual genes for RuBPC SSU and CAB we e 
tabulated The preferred codons of the maize and wheat genes for RuBPC SSU and CAB are .more 
™d in general than are those of the dicot species^This is in agreement t with Matsucka j ^ 987£ 
Biochem 102:673-676) who noted the extreme codon bias of the maize RuBPC SSU gen e ' as ™° 

o hrhTah^xpressed genes in maize leaves. CAB and phosphoenolpyruvate carboxylase. These genes 
a most wmpteSy avoid the use of A + T in codon position III. although this codon bias was not as 
plounced Tnon-leaf proteins such as alcohol dehydrogenase, zein 22 kDa sub-un.t. sucrose synthetase 
and ATpSdP translocator. Since the wheat SSU and CAB genes have a similar pattern J>f odor, 
orefe^nce this may reflect a common monocot pattern for these highly expressed genes in leaves ^ The 
C AB ^ene 'f or Lemna and the RuBPC SSU genes for Chlamdomonas share a similar extreme preference for 
Gt cTcodon position .... m dicot CAB genes, however. A + T degenerate bases are Purred by some 
synonymous codons (e.g.. GCT for Ala. CTT for Leu. GGA and GOT or Gly . * general, the G + C 
reference is less pronounced tor both RuBPC SSU and CAB genes in dicots than ,n monocots. 
^S^.^SLtlc gene for expression in plants, attempts are a.so made to e.imma e ^ 
which iSrelith L efficacy of gene ^ 

l ° *SS aS r ^S™,e Btt coding region are also prefer^ made tc , reduce*. 
A+T content in DNA base composition. The Btt Soding region has an A+T content of 64%. h.ch is about 
? 0 % hTghe" han that found in a'typica. planting region. Since A + T-rich regions typ,f V^t ^ 
reoions and plant regulatory regions, it is deemed prudent to reduce the A+T content. The synthetoc Btt 
aene is designeS t hi an A + T content of 55%. in keeping with values usually found .n plants. 
9 Also ^ a S modification (to introduce guanine in lieu of adenine) at the fourth nucleot.de pos^n m 
the Btt coding' equence is made in the preferred embodiment r^J!TS!S^^ 
belield to function as a plant initiation sequence (Taylor et al. (19S7) Mol. Gen Genet !^. 57 2-577) in 
oSation of expression In addition, in exemplifying Sis invention thirty-n.ne nuc ^^es (thirteen 
Sons) are added to the coding region of the synthetic gene in an attempt to stabihze primary transcnpts. 
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However, it appears that equally stable transcripts are obtained in the absence of this extension polypeptide 
containing thirty-nine nucleotides. 

Not all of the above-mentioned modifications of the natural Bt gene must be made in constructing a 
synthetic Bt gene in order to obtain enhanced expression. For example, a synthetic gene may be 

s synthesized for other purposes in addition to that of achieving enhanced levels of expression. Under these 
conditions, the original sequence of the natural Bt gene may be preserved within a region of DNA 
corresponding to one or more, but not afl, segments used to construct the synthetic gene. Depending on 
the desired purpose of the gene t modification may encompass substitution of one or more, but not all, of 
the oligonucleotide segments used to construct the synthetic gene by a corresponding region of natural Bt 

io sequence. 

As is known to those skilled in the art of synthesizing genes (Mandecki et al. (1985) Proc. Natl. Acad. 
Sci. 82:3543-3547; Feretti et al. (1986) Proc. Natl. Acad. Sci. 83:599-603")? "the DNA sequence to be 
synthesized is divided into segment lengths which can be synthesized conveniently and without undue 
complication. As exemplified herein, in preparing to synthesize the Btt gene, the coding region is divided 
15 into thirteen segments (A - M). Each segment has unique restriction sequences at the cohesive ends. 
Segment A, for example, is 228 base pairs in length and is constructed from six oligonucleotide sections, 
each containing approximately 75 bases. Single-stranded oligonucleotides are annealed and ligated to form 
DNA segments. The length of the protruding cohesive ends in complementary oligonucleotide segments is 
four to five residues. In the strategy evolved for gene synthesis, the sites designed for the joining of 
20 oligonucleotide pieces and DNA segments are different from the restriction sites created in the gene. 

In the specific embodiment, each DNA segment is cloned into a plC-20 vector for amplification of the 
DNA. The nucleotide sequence of each fragment is determined at this stage by the dideoxy method using 
the recombinant phage DNA as templates and selected synthetic oligonucleotides as primers. 

As exemplified herein and illustrated schematically in Figures 3 and 4, each segment individually (e.g., 
25 segment M) is excised at the flanking restriction sites from its cloning vector and spliced into the vector 
containing segment A. Most often, segments are added as a paired segment instead of as a single segment 
to increase efficiency. Thus, the entire gene is constructed in the original plasmid harboring segment A. The 
nucleotide sequence of the entire gene is determined and found to correspond exactly to that shown in 
Figure 1. 

30 In preferred embodiments the synthetic Btt gene is expressed in plants at an enhanced level when 
compared to that observed with natural Btt structural genes. To that end, the synthetic structural gene is 
combined with a promoter functional in plants, the structural gene and the promoter region being in such 
position and orientation with respect to each other that the structural gene can be expressed in a cell in 
which the promoter region is active, thereby forming a functional gene. The promoter regions include, but 

35 are not limited to, bacterial and plant promoter regions. To express the promoter region/structural gene 
combination, the DNA segment carrying the combination is contained by a cell. Combinations which include 
plant promoter regions are contained by plant cells, which, in turn, may be contained by plants or seeds. 
Combinations which include bacterial promoter regions are contained by bacteria, e.g., Bt orE. coli. Those 
in the art will recognize that expression in types of micro-organisms other than bacteria may"7n some 

40 circumstances be desirable and, given the present disclosure, feasible without undue experimentation. 

The recombinant DNA molecule carrying a synthetic structural gene under promoter control can be 
introduced into plant tissue by any means known to those skilled in the art. The technique used for a given 
plant species or specific type of plant tissue depends on the known successful techniques. As novel means 
are developed for the stable insertion of foreign genes into plant cells and for manipulating the modified 

45 cells, skilled artisans will be able to select from known means to achieve a desired result Means for 
introducing recombinant DNA into plant tissue include, but are not limited to, direct DNA uptake 
(Paszkowski, J. et al. (1984) EMBO J. 3:2717), electroporation (Fromm, M. et al. (1985) Proc. Natl. Acad. 
Sci. USA 82:5824), microinjection (Crossway. A. et al. (1986) Moi. Gen. Genet. 202:179), or T-DNA 
mediated transfer from Agrobacterium tumefaciens to the plant tissue. There appears to be no fundamental 

so limitation of T-DNA transformation to the natural host range of Agrobacterium . Successful T-DNA-mediated 
transformation of monocots (Hooykaas-Van Slogteren, G. et al. (1984) Nature 311:763), gymnosperm 
(Dandekar, A. et al. (1987) Biotechnology 5:587) and algae (Ausich, R. t EPO application 108.580) has been 
reported. Representative T-DNA vector systems are described in the following references: An, G. et al. 
(1985) EMBO J. 4:277; Herrera-Estrella, L. et al. (1983) Nature 303:209; Herrera-Estrella, L. et al. (1983) 

55 EMBO J. 2:987; Herrera-Estrella, L. et al. (1985) in Plant Genetic Engineering . New York:"Cambridge 
University Press, p. 63. Once introduced into the plant tissue, the expression of the structural gene may be 
assayed by any means known to the art, and expression may be measured as mRNA transcribed or as 
protein synthesized. Techniques are known for the in vitro culture of plant tissue, and in a number of cases, 
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15 



20 



for regeneration into whole plants. Procedures for transferring the introduced expression complex to 
commercially useful cultivars are known to those skilled in the art. 

in one of its preferred embodiments the invention disclosed herein comprises expression in plant cens 
of a synthetic insecticidal structural gene under control of a plant expressible promoter, that .s to say. by 
inserting the insecticide structural gene into T-DNA under control of a plant expressible promoter and 
introducing the T-DNA containing the insert into a plant cell using known means. Once plant ceils 
expressing a synthetic insecticidal structural gene under control of a plant expressible promoter are 
obtained plant tissues and whole plants can be regenerated therefrom using methods and techn.ques well- 
known in the art. The regenerated plants are then reproduced by conventional means and the introduced 
qenes can be transferred to other strains and cultivars by conventional plant breeding techniques. 

The introduction and expression of the synthetic structural gene for an insecticidal protein can be used 
to protect a crop from infestation with common insect pests. Other uses of the invention, exploiting the 
properties of other insecticide structural genes introduced into other plant species will be readily apparent 
to those skilled in the art. The invention in principle applies to introduction of any synthebcinsecticide 
structural gene into any plant species into which foreign DNA (in the preferred embodiment T-DNA) can be 
introduced and in which said DNA can remain stably replicated. In general, these taxa present^ include, but 
are not limited to. gymnosperms and dicotyledonous plants, such as sunflower (family Composrteae). 
tobacco (family Solanaceae), alfalfa, soybeans and other legumes (family Leguminoseae). cotton (fam.ly 
Malvaceae), and most vegetables, as well as monocotyledonous plants. A plant containing in its tissues 
increased levels of insecticidal protein will control less susceptible types of insect, thus providing advantage 
over present insecticidal uses of Bt. By incorporation of the insecticidal protein into the tissues of a plant 
the present invention additionallfprovides advantage over present uses of insecticides by elim.nat.ng 
instances of nonuniform application and the costs of buying and applying insecticidal preparations to a field. 
Also the present invention eliminates the need for careful timing of application of such preparations since 
J5 small larvae are most sensitive to insecticidal protein and the protein is always present, minimizing crop 
damage that would otherwise result from preapplication larval foraging. 

This invention combines the specific teachings of the present disclosure with a variety of techn.ques 
and expedients known in the art. The choice of expedients depends on variables such as the choice of 
insecticidal protein from a Bt strain, the extent of modification in preferred codon usage, manipulation of 
30 sequences considered to be destabilizing to RNA or sequences prematurely terminating « r ".« cn F*«"' 
insertions of restriction sites within the design of the synthetic gene to allow future nucleotide modifications, 
addition of introns or enhancer sequences to the 5 and/or 3 ends of the synthetic structural gene, the 
promoter region, the host in which a promoter region/structural gene combination is expressed, and the like. 
As novel insecticidal proteins and toxic polypeptides are discovered, and as sequences responsible for 
35 enhanced cross-expression (expression of a foreign structural gene in a given host) are elucidated, those of 
ordinary skill will be able to select among those elements to produce "improved" synthetic genes for 
desired proteins having agronomic value. The fundamental aspect of the present invention is the ability to 
synthesize a novel gene coding for an insecticidal protein, designed so that the protein will be expressed at 
an enhanced level in plants, yet so that it will retain its inherent property of insect toxicity and retain or 
40 increase its specific insecticidal activity. 
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EXAMPLES 

The following Examples are presented as illustrations of embodiments of the present invention. They do 
not limit the scope of this invention, which is determined by the claims. B ««««* 

The following strains were deposited with the Patent Culture Collection. Northern Regional Research 
Center, 1815 N. University Street, Peoria. Illinois 61604. 



50 



Strain 


Deposited on 


Accession # 


E. coli MC1061 (p544-Hindl!l) 
E. coli MC1061 (p544Pst-Met5) 


6 October 1987 
6 October 1987 


NRRL B-18257 
NRRL B-18258 



55 



The deposited strains are provided for the convenience of those in the art, and are not necessary to 
practice the present inv ntion. which may be practiced with the present disclosure in combination with 
publicly available protocols, information, and materials. E. coli MC1061. a good host for plasmid transforma- 
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tions. was disclosed by Casadaban, M.J. and Cohen, S.N. (1980) J. Mol. Bioi. 138:179-207. 
Example 1 : Design of the synthetic insecticidal crystal protein gene. 

5 

(i) Preparation of toxic subclones of the Btt gene 

Construction, isolation, and characterization of pNSB544 is disclosed by Sekar, V. et al. (1987) Proc. 

io Natl. Acad. Sci. USA 84:7036-7040, and Sekar, V. and Adang, M.J., U.S. patent application serial no. 
108,285, filed October 13, 1987, which is hereby incorporated by reference. A 3.0 kbp Hind lll fragment 
carrying the crystal protein gene of pNSBP544 is inserted into the Hind lll site of plC-20H (Marsh, J.L. et al. 
(1984) Gene 32:481-485), thereby yielding a plasmid designated p544- Hind lll, which is on deposit. 
Expression in E. colt yields a 73 kDa crystal protein in addition to the 65 kDa species characteristic of the 

75 crystal protein obtained from Btt isolates. 

A 5.9 kbp Bam HI fragmenTcarrying the crystal protein gene is removed from pNSBP544 and inserted 
into Bam HI-linearized plC-20H DNA. The resulting plasmid, p405/44-7, is digested with Bglll and religated, 
thereby removing Bacillus sequences flanking the 3'-end of the crystal protein gene. TheTesulting plasmid, 
p405/54-l2. is digested with PstI and religated, thereby removing Bacillus sequences flanking the 5 -end of 

20 the crystal protein and about 150 bp from the 5'-end of the crystal protein structural gene. The resulting 
plasmid, p405/81-4, is digested with Sphl and PstI and is mixed with and ligated to a synthetic linker having 
the following structure: 

SD MetThrAla 
25 5 1 CAGGATCCAACAATGACTGCA3 1 

3 1 GTACGTCCTAGGTTGTTACTG5 1 
Soh l Pst I 

30 (SD indicates the location of a Shine-Dalgarno prokaryotic ribosome binding site.) The resulting plasmid. 
p544Pst-Met5, contains a structural gene encoding a protein identical to one encoded by pNSBP544 except 
for a deletion of the amino-terminal 47 amino acid residues. The nucleotide sequence of the Btt coding 
region in p544Pst-Met5 is presented in Figure 1. In bioassays (Sekar and Adang, U.S. patent application 
serial no. 108,285, supra ), the proteins encoded by the full-length Btt gene in pNSBP544 and the N-terminal 

35 deletion derivative, p544Pst-Met5, were shown to be equally toxic. All of the plasmids mentioned above 
have their crystal protein genes in the same orientation as the lacZ gene of the vector. 
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(ii) Modification of preferred codon usage 



Table 1 presents the frequency of codon usage for (A) dicot proteins, (B) Bt proteins, (C) the synthetic 
Btt gene, and (D) monocot proteins. Although some codons for a particular amino acid are utilized to 
approximately the same extent by both dicot and Bt proteins (e.g., the codons for serine), for the most part, 
the distribution of codon frequency varies significantly between dicot and Bt proteins, as illustrated in 
45 columns A and B in Table 1 . 
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Table 1. Frequency of Codon Usage 



Amino 




(A) Dicot 


(B)Bt 


Acid 


Codon 




Genes 


Gly 


GGG 


0.12 


0.08 


Gly 


GGA 


0.37 


0.53 


Gly 


GGT 






Gly 


GGC 


0.16 


0. 16 


Glu 


GAG 


0.52 


0.13 


Glu 


GAA 


0.48 


0.87 


Asp 


GAT 


n R7 


0 • 68 


Asp 


GAC 


0.43 


0.32 


Val 


GTG 


0.30 


0.15 


Val 


GTA 


0.12 


0.32 


Val 


GTT 


A "7 Q 

Oi JO 




Val 


GTC 


0.20 


0.24 


Ala 


GCG 


0.05 


0.12 


Ala 


GCA 


0.26 


0.50 


Ala 


GCT 


0.44 




Ala 


GCC 


0.28 


0.06 


Lys 


AAG 


0.61 


0.13 


Lys 


AAA 


0.39 


0.87 


Asn 


AAT 


0.43 




Asn 


AAC 


0.55 


0.21 


Met 


ATG 


1.00 


1.00 


He 


ATA 


0.19 


0.30 


He 


% mm 

ATT 


f\ A A 




He 


ATC 


0.36 


0.13 


Thr 


ACG 


0.07 


0.14 


Thr 


ACA 


0.27 


0.68 


Thr 


ACT 


0.36 


0 . 14 


Thr 


ACC 


0.31 


0.05 


Trp 


TGG 


1.00 


1.00 


End 


TGA 


0.46 


0.00 


Cys 


TGT 


0.43 


0.33 


Cys 


TGC 


0.57 


0.67 


End 


TAG 


0.18 


0.00 


End 


TAA 


0.37 


1.00 


Tyr 


TAT 


0.42 


0.81 


Tyr 


TAC 


0.58 


0.19 



Distribution Fraction 



(C) Synthetic 
Btt Gene 



1.00 
0.20 
0.43 
0.37 

0.07 
0.27 
0.34 
0.32 

1.00 
0.00 
0.33 
0.67 

0.00 
1.00 
0.43 
0.57 



(D)Monocot 
Genes 



0.13 


0.21 


0.37 


0. 18 


0.34 


0.21 


0.16 


0.40 


0.52 


0.77 


0.48 


0.23 


0.56 


0.31 


0.44 


0.69 


0.30 


0.38 


0.10 


0.07 


0.35 


0.20 


0.25 


0. 34 


0.06 


0.20 


0.24 


0. 16 


0.41 


0.28 


0.29 


0.36 


0.58 


0.87 


0.42 


0.13 


0.44 


0.23 


0.56 


0.77 



1.00 
0.09 
0.27 
0.64 



0. 
0 
0 
0 

1 

o 

0 
0 



18 
14 
22 

,47 

.00 
.34 
.27 
.73 



0.44 
0.22 
0.19 
0.81 
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Table 1 (CONTINUED) 



Distribution Fraction 



10 



75 



20 



25 



30 



35 



40 







( A\ Dicot 


(B) Bt 


(C) Synthetic 


Ac id, 




Genes 


Genes 


Btt Gene 




r P r P r P 

J. X X 


0. 45 


0.75 


0.44 


Phe 


TTC 


0.55 


0.25 


0.56 


Ser 


AGT 


0.14 


0.25 


0.13 


Q ^ T" 


AGC 


0 . 18 


0. 13 


0. 19 


Del 


TCG 


0 . 05 


0.08 


0. 06 


Car 


TCA 


0. 18 


0. 19 


0. 17 


Ser 


TCT 


0.26 


0.25 


0.27 


Ser 


TCC 


0. 19 


0. 10 


0 . 17 


Airy 


AGG 


0. 22 


0.09 


0.23 


Arg 


AGA 


0.31 


0.50 


0.32 


nl y 


CGG 


0. 04 


0. 14 


0.05 


Arg 


PGA 


0. 09 


0. 14 


0.09 


Arg 


CGT 


0.23 


0.09 


0.23 


Arg 


CGC 


0.11 


0.05 


0. 09 


n 

uin 


PIG 


0.38 


0. 18 


0.39 


bin 




0. 62 


0.82 


0. 61 


His 


CAT 


0. 52 


0.90 


0.50 


His 


CAC 


0.48 


0.10 


0.50 


Leu 


TTG 


0.26 


0.08 


0.27 


Leu 


TTA 


0.10 


0.46 


0.12 


Leu 


CTG 


0.09 


0.04 


0.10 


Leu 


CTA 


0.08 


0.21 


0.10 


Leu 


CTT 


0.29 


0.15 


0.18 


Leu 


CTC 


0.19 


0.06 


0.22 


Pro 


CCG 


0.07 


0.20 


0.08 


Pro 


CCA 


0.44 


0.56 


0.44 


Pro 


CCT 


0.32 


0.24 


0.32 


Pro 


CCC 


0.16 


0.00 


0.16 



Genes 



0.28 
0.72 
0.07 
0.25 
0.13 
0.13 
0.18 
0.24 

0.28 
0.08 
0.14 
0.04 
0.11 
0.36 

0.43 
0.57 
0.38 
0.62 

0.15 
0.04 
0.27 
0.11 
0.16 
0.27 

0.20 
0.39 
0.19 
0. 22 
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50 



Bt coding sequences publicly available and 88 coding 
sequences of dicot nuclear genes were used to compile the 
codon usage table. The pooled dicot coding sequences, 
obtained from Genbank, were: 
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GENUS/ SPECIES 



GEN DANK 



PROTEIN 



RKF 



TO 



75 



20 



25 



30 



35 



40 



45 



50 



Antirrhinum majus 
Arab'tdopsis iholiana 



Bcnhatietia excclsa 
Bcasstcc campesais 
Brassica napus 
Brassica olcacca 
Cancrwiia cnsifomiis 
Carica papcyc 
Chtanidamonas 
reinhardtii 



Cucurbita pcpo 
Qtcumis sati\*ts 



Daucus cars i a 

Dclichcs bipotus 
FiaKtia nwcruia 
Ghtinc max 



AMACUS 

ATX LAD II 

ATHH3CA 

ATHIX3GB 

ATIIH4GA 

ATHLItCPl 

ATX {TUBA 



BNANAP 
BOLSLSCR 
CENCONA 
CPAPAP 

CREC552 

CRERBCS1 

CRERBCS2 

cucpirr 

CUSCMS 

CUSLHCPA 

CUSSSU 

DAREXT 

DAREXnt 

D8ILECS 

FTRBCR 

SOY7SAA 

SOYACTlC 

soYCiiri 

SOYGLYA1A 

SOYGLYAAB 

SOYGLYAB 

SOYCLYR 

SOYHSP17S 

SOYLGBI 

SOYIXA 

SOYUOX 

SOYN0020C 

SOYNOB23G 

SOYNO0I4H 

SOYNOOX6B 

SOWOOMR 

SOYNO027R 

SOYNOD35M 

SOYNOD7S 

SOYNODRl 

SOYNODR2 

SOYPRPt 

SOYRUBP 

SOYURA 

SOYIISP24A 
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Chalcone synthetase 
Alcohol deh><Irogcnase 
I Ustonc 3 gene I 
Historic 3 gene 2 
! listone 4 gene 1 
CAB 
ct tubulin 

S-cnolpyruvyt4hifatc 3-phc/sphatc 
synthetase 

High methionine storage protein 

Acyl carrier protein 

Napin 

S-locus specific gl>TOprotcio 

Goncanavalm A 

Papain 

Prcapocytochromc 
RuBPC small subunit gene 1 
RuBPC small subunit gene 2 
Photochrome 

Glyouosomal matate synthetase 
CAD 

RuBPC small subunit 
Erie ft sin 

33 kD exiensin related protein 
seed lectin 

RuBPC smalt subunit 
7S storage protein 
Actui 1 

Ot protease inhibitor 
Glycinin Ala Bx subunia 
Glycinm ASA4B3 subunio 
Glycinin A3/b4 subunits 
Glycinin A2BU subunits 
Low M W hot shock proteins 
Ler^cmoglobin 
Lectin 

Lipoxygenase 1 
20 kDa nodulin 

23 kDa nodulin 

24 kDa nodulin 
26 kDa nodulin 
26 kpa nodulin 
2? kDa nodulin 
35 kDa nodulin 
75 kDa nodulin 
Nodulin CSX 
Nodulin E27 
Proline rich protein 
RuBPC small subunit 
Urease 

1 (cat shock protein 26A 
Nudear-encoded chtoroplast 
heat shock protein 
22 kDa nodulin 
0\ tubulin 
ffl tubulin 



5 
6 
6 
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<;lnus/ sri:cn;s ci-.miam. pkoi'kin 

• •! f j-I. j!m,i. m { M.-i.tn) 
IMianthtts annul IINNKUIU.N KuUl'C small suOuuu 

albumin seed Murage pr«»ci<i 
I pomace ha.iaia\ Wound -induced catalavc 

ijrmna gbba LGIAIH9 CAD 

LGrRSIlPC RuDPC small subunit 

Lupmus tut e us LUPLBU Icghemoglobin 1 

L\ copen icon 

cscuicntum TOMBIOUR Biottn binding protein 

TOMETinTJR Ethylene biosynthesis protein 
TO M PG2A R Polygalactu ronasc -2a 

TOMPSl Tomato photosysicm I protein 

TOMRBCSA RuDPC small subunit 

TOMRBCSB RuDPC small subunit 

TOMRDCSC RuDPC small subunit 

TOMRBCSO RuDPC small subunit 

TOMRRD Ripening related protein 

TOMW1PIC Wound induced proteinase 

inhibitor I 

TOMWI PU Wound induced proteinase 

inhibitor (I 
CAD IA 
CAB ID 
CAD3C 
CAB 4 
CADS 

ALFX.D3R Lcghcmoglobin III 

RuDPC small subunit 

TO 0ATP21 M iiochondrial ATP synthase 

0 subunit 
Nitrate reductase 
Outaminc synthetase 
TOOECH Endocfeitinase 
TOBGAFA A subunit of cntoroptast G3PO 

TODCAPB D subunit of chloropUsi G3PD 

TODCAPC Csubuniiofctt!oroptasiC3PO 
TODPR1AR Pathogenesis related protein U 

TODPRtCR Pathogenests-retated protein U 

TODPRPR Pathogenesis related protein lb 

TODPXOLF Peroxidase 
TODRBPCO RuDPC (mall subunit 

TOOTIIAUR TMV-induccd protein homologous 

to thaumatin 
AVOCEL Cellulate 

PI tOC ML Chalconc synthase 

PETCAB13 CA0I3 

PETCAB22L CA022L 

PCTCAB2ZR CAD23R 

PETCABL5 CAB 25 

PCTCABJ7 CA0 37 

PETCAB91R CAD 91 R 

PETCHSR Qiafconc synthase 

PCTGCRt Gl>rine-rich protein 

PCTRBCS08 RuDPC small subunit 

PETRBCSU RuDPC small subunit 

70 kDa heat shock protein 

Phase a fas wtgarit PlfVCH M Chitinase 

P11VDLECA Phytohemaggluttnin £ 

PIIVDLECQ Phyiohemagglutinin L 

PIIVCSRt Glutamine synthetase 1 

PIIVGSR2 Cluiamine synthetase 2 
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Medtcago jotao 

Mczembryanthcmuni 

aystaib'num 

Nicotuuta 

ptumbagutifoltfi 



Nicoiiana tabocum 



Perseus emcnenna 
Pamsetinum 
ftortense 
Petunia sp. 
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Table 1 (CONTINUED) 



GENUS/ SPECIES 



GEN HANK 



PROTEIN 
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Pisum sativum 



15 



20 



25 



30 



Raplt cuius sal m/ s 
Ricinus communis 



Sitcrtc prdicnsts 

Sinapts cJba 
Solarium tubavsum 



35 



40 



Spinada otmacee 



Vicia fabc 



45 



phvlua 

PHVPAL 
niVPllASAR 

piiyphashr 



PEAALD2 

FEACAB80 

PEAGSR1 

TEALECA 

PEALEGA 

TEA RUB PS 

PEAVIC2 

PEAV1C4 

PEAV1C7 



RCCACG 

RCCRICIN 

RCC1CL* 

SIPfBX 

SlPPCY 

SAUGAPDK 

POTPAT 

POTINHW! 

POTLS1G 

POTPIZG 

POTRBCS 

SPIACPI 
SPIOECI6 

SPtOEOS 

sripcc 
srirsss 



VFALBA 
VFALEB4 



I xf.he modioli in 
Ixetin 

Phenylalanine ammonia lyase 

or ph ascot tn 

p phascolin 

A reel in seed proiein 

Chalcone synthase 

Seed albumin 

CAB 

Gluuminc synthetase (nodule) 

Lectin 

Legumin 

RuBPC small subunit 

Vicilin 

Vicilin 

Vicilin 

Alcohol dehydrogenase I 
Qutamine synthetase (leaf) 
Gluumine synthetase (root) 
Mistone I 

Nuclear encoded chloroplast 
heat shock protein 
RuDPC small subunit 
Agglutinin 
Ricin 

Isocitrate lyase 
Fcrrodoan precursor 
Piastocyamn precursor 
Nuclear gene for G3PD 
Paiaiin 

Wound-induced proteinase 
inhibitor 

Ught-inducible tissue specific 
ST-LS1 gene 

Wound-induced proteinase 
inhibitor II 
RuDPC smalt subunit 
Sucrose synthetase 
Acy carrier protein I 

16 kD« phoiosynihetic 
OKyjcn-cvoNinj protein 

23 kDa photosynthetic 

o*yten-cvo*vinj protein 

PUstocyanin 

33 kOa photosynthetic water 
ondatioA com pie « precursor 
Glycotatc oxidase 
Ijcgtvcmogtobin 
Legumin 0 
VkSHin 
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Table 1 (CONTinUL : !>) 



is 



20 



25 



CENUS/SrEClES 



CENBANK 



TROTEIN 



Avcna saliva 
iiordeum vutgorc 



Oryza sathw 
Triticum acstMtm 
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35 



40 



Setaic ctrcale 
Zea mays 
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ASTAP3R 

BLYALR 

BLYAMY1 

BLYAMY2 

BLYCHORDl 

BLYCLUCB 

BLY1IORB 

BLYTAPI 

BLYTHIAR 

BLYUBIQR 



R1CCLUTC 

NVKTAMYA 

WirTCAB 

WHTCMR 

uirrciR 

WKTCLGB 

wtrrcuABA 
wirrcLLm 

\V1fTH3 

WHTH4091 

WHTRBCB 

RYESECCSR 

NtZEAlC 

MZEACTIG 

MZEADHHF 

MZEADK2KR 

mzeald 

M2EAKT 

MZEEG2R 

MZECGST38 

MZEH3Q 

MZEH4CM 

MZEHSP701 

MZEIISITO 

MZEUICT 

MZ£MI»L3 

MZErercR 

MZERBCS 

MZESUSYSC 

MZETP12 

MZEZEA20M 

MZEZEA30M 

MZEZE1SA3 

MZEZE16 

MZEZEI9A 

MZEZE22A 

MZEZE22Q 



rhytochromc 3 
Atcurain 
a amylase 1 
a amylase 2 
Hordcin C 
/Jglucanasc 
01 hordcin 

Amylase/ protease innibtior 
Toxin a hordothiontn 
Ubiquttin 

Htstonc 3 25 
Leaf specific thionin I 26 
Leaf specific thiomn 2 26 
riasiocyartin 27 
Glutciin 

Gluteiirt 28 

a amylase 

CAB 

Em protein 

gibfcxrellin responsive protein 
Igliadin 

<xfp gliadin Class All 
High MW glutemn 
Historic 3 
Htstone 4 

RuBPC small Subunit 
Ttccaiin 

40.1 kO Al protein (NAJDPH- 
dependent reductase) 
Actin 

Alcohol dehydrogenase 1 
Alcohol dehydrogenase 2 - 
Aldolase 

ATP/ADP transtocaior 
Glutelm 2 

Glutathione S transferase 
K&onc 3 
I ttstone 4 

70 ID Heat chock protein, exon I 
70 tO Heat shock protein, exon 2 
CAB 

Lipid body surfarc protein L3 

Phosphoenolyrwatc carboxylase 

RuDPC small subunit 

Sucrose synthetase 

Tnosephosphite isomerase 1 

19 kDzcfn 

19 fcOiein 

15 kDtein 

l6kDicin 

!9kDxcin 

22 kD rein 

22 kD rein 

Ottlase 2 29 
Regulatory CI locus 30 
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Table 1 (CONTINUED) 

Bt codons were, obtained .rorn analysis oi 'jff™*^™™ 
of the following genes: Dt var. Kursca-.i nu , 

W-'al nil") W * V-S^^v in TnvPrtebratP Pathology 

New York, pp. 85-99) ; K . var. toggd**! HD 1 4 .5 jcb 
fragment (Schnepf and Whiteley (issd; ° windTTT 

26016273-6280); and St var. rPnebrionis 3 0 k±> Hxndlll 
figment (Sekar et al. (1987) Proc. Natl. Acad. Sci . 
84:7036-7040) . 
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Table 1 (CONTINUED) 
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For example, dicots utilize the AAG codon for lysine with a frequency of 61% and the AAA codon with a 
frequency of 39%. In contrast, in Bt proteins the*lysine codons AAG and AAA are used with a frequency of 
13% and 87%, respectively. It is known in the art that seldom used codons are generally detrimental to that 
system and must be avoided or used judiciously. Thus, in designing a synthetic gene encoding the Btt 
crystal protein, individual amino acid codons found in the original Btt gene are altered to reflect the codons 
preferred by dicot genes for a particular amino acid. However, attention is given to maintaining the overall 
distribution of codons for each amino acid within the coding region of the gene. For example, in the case of 
45 alanine, it can be seen from Table 1 that the codon GCA is used in Bt proteins with a frequency of 50%. 
whereas the codon GCT is the preferred codon in dicot proteins. In designing the synthetic Btt gene, not all 
codons for alanine in the original Bt gene are replaced by GCT; instead, only some alanine codons are 
changed to GCT while others are replaced with different alanine codons in an attempt to preserve the 
overall distribution of codons for alanine used in dicot proteins. Column C in Table 1 documents that this 
goal is achieved; the frequency of codon usage in dicot proteins (column A) corresponds very closely to 
that used in the synthetic Btt gene (column C). 

In similar manner, a synthetic gene coding for insecticidal crystal protein can be optimized for 
enhanced expression in monocot plants. In Table 1 , column D, is presented the frequency of codon usage 
of highly expressed monocot proteins. 

Because of the degenerate nature of the genetic code, only part of the variation contained in a gene is 
expressed in this protein. It is clear that variation between degenerate base frequencies is not a neutral 
phenomenon since systematic codon preferences have been reported fcr bacterial, yeast and mammalian 
genes. Analysis of a large group of plant gene sequences indicates that synonymous codons are used 
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differently by monocots and dicots. These patterns are also distinct from those reported for E coli yeast 
3 5 emphasize; fS£. engineering high expression of foreign genes in yeast and other systems. 
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p) Sequences ^ 
has an A + T content of 55%. 
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Table 3 



| Adenine + Thymine Content in Btt Coding Region 








Base 




%G + C 


%A + T 


Coding Region 
















Q 


A 


T 


C 






Natural Btt gene 
Synthetic Btt gene 


341 
392 


633 
530 


514 
483 


306 
428 


36 
45 


64 
55 
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In addition, the natural Btt gene is scanned for sequences that are potentially destabilizing to Btt RNA. 
These sequences, when identified in the original Btt gene, are eliminated through modification of nucleotide 
sequences. Included in this group of potentially destabilizing sequences are: 

(a) plant polyadenylation signals (as described by Joshi (1987) Nucl. Acids Res. 15:9627-9640). In 

s eukaryotes, the primary transcripts of nuclear genes are extensively processed (steps" including 5 - 
capping, intron splicing, polyadenylation) to form mature and translatable mRNAs. In higher plants, 
polyadenylation involves endonucleotylic cleavage at the polyA site followed by the addition of several A 
residues to the cleaved end. The selection of the poiyA site is presumed to be cis-regulated. During 
expression of Bt protein and RNA in different plants, the present inventors have observed that the 

w polyadenylated mRNA isolated from these expression systems is not full-length but instead is truncated or 
degraded. Hence, in the present invention it was decided to minimize possible destabilization of RNA 
through elimination of potential polyadenylation signals within the coding region of the synthetic Btt gene. 
Plant polyadenylation signals including AATAAA, AATGAA, AATAAT, AATATT, GAT AAA, GATAAA, and 
AATAAG motifs do not appear in the synthetic Btt gene when scanned for 0 mismatches of the sequences. 

75 (b) polymerase II termination sequence, CAN7-9 AGTNNAA. This sequence was shown (Vankan and 

Rlipowicz (1988) EMBO J. 7:791-799) to be next to the 3' end of the coding region of the U2 snRNA genes 
of Arabidopsis thaliana and is believed to be important for transcription termination upon 3' end processing. 
The synthetic Btt gene is devoid of this termination sequence. 

(c) CUUCGG hairpins, responsible for extraordinarily stable RNA secondary structures associated 
20 with various biochemical processes (Tuerk et at. (1988) Proc. Natl. Acad. Sci. 85:1364-1368). The 

exceptional stability of CUUCGG hairpins suggestslhat they have an unusual structure and may function in 
organizing the proper folding of complex RNA structures. CUUCGG hairpin sequences are not found with 
either 0 or 1 mismatches in the Btt coding region. 

(d) plant consensus splice~sites, 5 = AAG:GTAAGT and 3' = I II I (Pu)TTT(Pu)T(Pu)T(Pu)T(Pu)- 
25 TGCAG:C. as described by Brown et al. (1986) EMBO J. 5:2749-2758. Consensus sequences for the 5 and 

3 splice junctions have been derived from 20 and 30 plant intron sequences, respectively. Although it is not 
likely that such potential splice sequences are present in Bt genes, a search was initiated for sequences 
resembling plant consensus splice sites in the synthetic Btt gene. For the 5 splice site, the closest match 
was with three mismatches. This gave 12 sequences of which two had G:GT. Only position 948 was 
30 changed because 1323 has the Kpnl site needed for reconstruction. The 3 # -splice site is not found in the 
synthetic Btt gene. 

Thus, by highlighting potential RNA-destabilizing sequences, the synthetic Btt gene is designed to 
eliminate known eukaryotic regulatory sequences that effect RNA synthesis and processing. 

35 

Example 2. Chemical synthesis of a modified Btt structural gene 



(i) Synthesis Strategy 

40 

The general plant for synthesizing linear double-stranded DNA sequences coding for the crystal protein 
from Btt is schematically simplified in Figure 2. The optimized DNA coding sequence (Figure 1) is divided 
into thirteen segments (segments A-M) to be synthesized individually, isolated and purified. As shown in 
Figure 2, the general strategy begins by enzymatically joining segments A and M to form segments AM to 

45 which is added segment BL to form segment ABLM. Segment CK is then added enzymatically to make 
segment ABCKLM which is enlarged through addition of segments DJ, El and RFH sequentially to give 
finally the total segment ABCDEFGHIJKLM, representing the entire coding region of the Btt gene. 

Figure 3 outlines in more detail the strategy used in combining individual DNA segments in order to 
effect the synthesis of a gene having unique restriction sites integrated into a defined nucleotide sequence. 

50 Each of the thirteen segments (A to M) has unique restriction sites at bo 1 ends, allowing the segment to be 
strategically spliced into a growing DNA polymer. Also, unique sites are placed at each end of the gene to 
enable easy transfer from one vector to another. 

The thirteen segments (A to M) used to construct the synthetic gene vary in size. Oligonucleotide pairs 
of approximately 75 nucleotides each are used to construct larger segments having approximately 225 

55 nucleotide pairs. Figure 3 documents the number of base pairs contained within each segment and 
specifies the unique restriction sites bordering each segment. Also, the overall strategy to incorporate 
specific segments at appropriate splice sites is detailed in Figure 3. 
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(ii) Preparation of oligodeoxynucleotides 

Preoption of oligodeoxynucleotides for use in the synthesis of a DMA sequence 
Btt is carried out according to the general procedures described by Matteucc. et al. (1981) J. Am Chem. 
s 55c iS5«S S and Beaucage et al. (1981) Tetrahedron Lett. 22:1859-1862. A.I o.igonucleot.des are 
prepaTil b the solid-phase phosphorlmidite triester coupling appToach. using an Applied Biosystems 
MoSe 380A DNA synthesizer Depletion and cleavage of the oligomers from the sohd support are 
earned out according to standard procedures. Crude oligonucleotide mixtures are , punf.ed us, g an 
oligonucleotide purification cartridge (OTC, App.ied Biosystems) as desenbed by McBnde et al. (1988) 

W Bi TpCS«on 6 of o-igonucleotides is performed with T4 polynucleotide kinase 

tains 2ug oEgonuOeotide and 18.2 units polynucleotide kinase (Pharmacia) in linker >~^' 
(1982) Cloning Manual. Fritsch and Sambrook (eds.), Cold Spring Harbor Laboratory. Cold Spring Harbor. 
mv\ Th« reaction is incubated at 37 C for 1 hour. 
J5 ^omoZ^es are annealed by first heating to 95* C for 5 min. and then allowing com^mentary 
pairs to cool slowly to room temperature. Annealed pairs are reheated to 65 C. solutions are combined 
cooled sSy to room temperature and kept on ice until used. The ligated mixture may be punf.ed by 
e^ophtesis through a 4% NuSieve agarose (FMC) gel. The band corresponding to the ..gated duplex ,s 
excised the DNA is extracted from the agarose and ethanol precipitated. , .... . 

so Lotions carried out as exemplified by that used in M segment ligafons U « 
brought to 65* C for 25 min. the desired vector is added and the reaction m.xture 'S 'ncubated at 65 C fo 
l5 min The reaction is slow cooled over 1-1/2 hours to room temperature. ATP to 0.5mM and 3.5 un,ts of 
T4 Sna gase salts are added and the reaction mixture is incubated for 2 hr at room temperature i and then 
maintTned ove^gM at 15* C. The next morning, vectors which had not been ligated to M block DNA we e 
25 Z^uJTSLlMdon by EcoRI digestion. Vectors ligated to the M segment DNA are used to 
ZZVe. coi MC1061. felonfe containing inserted b.ocks are identified by colony hybndiza on 
rhSpTabled^ligonucleotide probes. The sequence of the DNA segment ^ confirmed by is latng 
plasmid DNA and sequencing using the dideoxy method of Sanger et al. (1977) Proc. Natl. Acad. Sc. 
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74:5463-5467. 
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(Hi) Synthesis of Segment AM 

Three oligonucleotide pairs (A1 and its complementary strand A1C, A2 and A2C and A3 and A3C) are 
bJ^^S^ as described above to make up segment A. The nucleotide sequence of segment A 
is as follows: 
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In Table 4, bold lines demarcate the individual oligonucleotides. Fragment A1 contains 71 bases, A1c has 
76 bases, A2 has 75 bases, A2C has 76 bases. A3 has 82 bases and A3C has 76 bases. In all, segment A 

45 is composed of 228 base pairs and is contained between Eco RI restriction enzyme site and one destroyed 
EcoRI site (5 )J. (Additional restriction sites within Segment A are indicated.) The EcoRI single-stranded 
cohesive ends allow segment A to be annealed and then ligated to the EcoRI-cut cloning vector, plC20K. 

Segment M comprises three oligonucleotide pairs: M1, 80 bases, M1c, 86 bases, M2, 87 bases, M2c, 
87 bases, M3, 85 bases and M3c 79 bases. The individual oligonucleotides are annealed and ligated 

SQ according to standard procedures as described above. The overall nucleotide sequence of segment M is: 
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In Table 5 bold lines demarcate the individual oligonucleotides. Segment M contains 252 base pairs and 
has destroyed EcoRI, restriction sites at both ends. (Additional restriction sites within segment M ar 
indicated). SegrrleTrt M is inserted into vector plC20R at an EcoRI restriction site and cloned. 

As proposed in Figure 3. segment M is joined to segment A in the plasmid in which it is contained. 
Seoment M is excised at the flanking restrictions sites from its cloning vector and spliced into plC20K. 
harboring segment A, through successive digestions with Hind... fo.lowed by Bgl... The 0C2OK vector now 
comprises segment A joined to segment M with a Hind... site at the splice site (see Figure , 3 . Plasmid 
DIC20K is derived from plC20R by removing the Scal-Ndel DNA fragment and inserting a Hmcll fragment 
containing an NPT1 coding region. The resulting plasmid of 4.44 kb confers resistance to kanamycin on E. 
coli. 
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Example 3. Expression of synthetic crystal protein gene in bacterial systems 

The synthetic Btt gene is designed so that it is expressed in the plC20R-kan vector in whicf , H : is 
constructed. This egression is produced utilizing the initiation methionine of the lacZ protein of plC20K. 
The wild-type Btt crystal protein sequence expressed in this manner has full msectic.dal activity. In addition, 
the synthetic g"5ne is designed to contain a BamHI site 5 proximal to the initiating meth.on.ne codon and a 
Bglll site 3' to the terminal TAG translation stop-codon. This facilitates the cloning of the insecticdal crystal 
prrtein coding region into bacterial expression vectors such as pDR540 (Russell and Bennett 1982) 
Plasmid pDR540 contains the TAC promoter which allows the production of proteins including Btt crystal 
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protein under controlled conditions in amounts up to 10% of the total bacterial protein. This promoter 
functions in many gram-negative bacteria including E. coli and Pseudomonas . 

Production of Bt insecticidal crystal protein from the synthetic gene in bacteria demonstrates that the 
protein produced has the expected toxicity to coleopteran insects. These recombinant bacterial strains in 
s themselves have potential value as microbial insecticides, product of the synthetic gene. 



Example 4. Expression of a synthetic crystal protein gene in plants 

70 The synthetic Btt crystal protein gene is designed to facilitate cloning into the expression cassettes. 

These utilize sitesTompatible with the BamHl and Bglll restriction sites flanking the synthetic gene. 
Cassettes are available that utilize plant promoters including CaMV 35S. CaMV 19S and the ORF 24 
promoter from T-DNA. These cassettes provide the recognition signals essential for expression of proteins 
in plants. These cassettes are utilized in the micro Ti plasmids such as pH575. Plasmids such as pH575 

75 containing the synthetic Btt gene directed by plant expression signals are utilized in disarmed Agrobac- 
tertum tumefaciens to introduce the synthetic gene into plant genomic DNA. This system has been 
described previously by Adang et al. (1987) to express Bt var. kurstaki crystal protein gene in tobacco 
plants. These tobacco plants wereloxic to feeding tobacco hornworms. 

20 

Example 5. Assay for insecticidal activity 

Bioassays were conducted essentially as described by Sekar, V. et al. supra . Toxicity was assessed by 
an estimate of the LD 5 o. Plasmids were grown in E. coli JM105 (Yanisch-Perron, C. et al. (1985) Gene 

25 33:103-119). On a molar basis, no significant differences in toxicity were observed between crystal proteins 
encoded by p544Pst-Met5, p544-Hindlll, and pNSBP544. When expressed in plants under identical 
conditions, cells containing protein encoded by the synthetic gene were observed to be more toxic than 
those containing protein encoded by the native Btt gene. Immunoblots ("western" blots) of cell cultures 
indicated that those that were more toxic had more crystal protein antigen. Improved expression of the 

30 synthetic Btt gene relative to that of a natural Btt gene was seen as the ability to quantitate specific mRNA 
transcriptslrom expression of synthetic Btt genes on Northern blot assays. 



Claims 

35 

1. A synthetic gene designed to be highly expressed in plants comprising a DNA sequence encoding 
an insecticidal protein which is functionally equivalent to a native insecticidal protein of Bt. 

2. A synthetic gene of claim 1 wherein said DNA sequence is at least about 85% homologous to a 
native insecticidal protein gene of Btt. 

40 3. A synthetic gene of claim 1 wherein said DNA sequence is that presented in Figure 1, spanning 
nucleotides 1 through 1793. 

4. A synthetic gene of claim 1 wherein said DNA sequence is that presented in Figure 1 spanning 
nucleotides 1 through 1833. 

5. A synthetic gene of claim 1 wherein the overall frequency of preferred codon usage within the entire 
45 coding region of said synthetic gene is within about 75% of the frequency of codon usage preferred in 

plants. 

6. A synthetic gene of claim 1 wherein the A + T base content of said DNA sequence is substantially 
equal to the A + T base content found in plant structural genes. 

7. A synthetic gene of claim 1 wherein a plant initiation sequence is present at the 5' end of the coding 
so region. 

8. A synthetic gene of claim 1 wherein plant polyadenyla-tion signals, comprising those having 
AATAAA, AATGAA, AATAAT, AATATT, GAT AAA, GATAAA and AATAAG motifs, are eliminated in said DNA 
sequence. 

9. A synthetic gene of claim 1 wherein the polymerase II termination sequence, C AN7 -9 AGTN N AA, is 
55 eliminated in said DNA sequence. 

10. A synthetic gene of claim 1 wherein CUUCGG hairpins are eliminated in said DNA sequence. 

11. A synthetic gene of claim I wherein plant consensus splice sites, including 5' = AAG:GTAAGT and 
3' = 1 i i i (Pu)TTT(Pu)T(Pu)T(Pu)T(Pu)TGCAG:C, are eliminated in said DNA sequence. 

26 
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12. A synthetic gene of claim 1 wherein the CG and TA doublet avoidance indices are substantially 
equal to that of highly expressed genes in the selected host plant. 

13. A recombinant DNA cloning vector comprising said synthetic gene of claim 1. 

14 A plant cell which contains the synthetic gene of claim 1. 

15 An improved method of producing a protein toxic to an insect comprising the step of introducing 
into a host plant cell a DNA segment comprising a synthetic gene designed to be highly expressed in 
plants comprising a DNA sequence encoding an insecticidal protein which is functionally equivalent to a 
native insecticidal protein of Bt such that said synthetic gene is expressed in said plant host. 
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Figure 2. Simplified Scheme 
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FIGURE 3 
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